# Report of MatMul Simulation of Project 2 in Computer Architecture By Kenneth Nnadi And Quang Dang

### Part 1 table

```
Part 1:
Cache Hits Misses AvgMissLatency Hit_Percentage
Unified L1 cache 11810614 3490 89,913.75 99.97
```

The cache size for the part is 256KB and associativity of 2.

# Part2 table



The cache size for both ichache and dcache are 256KB and associativity of 2.

### Part 3:

```
Part 3:

Cache Hits Misses AvgMissLatency Hit_Percentage

Unified L1 cache (Associativity+1) 11810614 3490 89,913.75 99.97

Unified L1 cache (Associativity+2) 11810614 3490 89,913.75 99.97

Unified L1 cache (Associativity+4) 11810632 3472 89,985.31 99.97

Unified L1 cache (Associativity+6) 11810635 3468 99,878.72 99.97

Unified L1 cache (Associativity+1) 11810635 3468 99,878.72 99.97

Unified L1 cache (Associativity+2) 11810635 3468 99,878.72 99.97

Unified L1 cache (Associativity+3) 11810635 3466 99,125.50 99.97

L1 instruction caches (Associativity+1) 8497648 1189 75,756.09 99.93

L1 instruction caches (Associativity+2) 8497655 1182 76,206.42 99.93

L1 instruction caches (Associativity+8) 8497655 1182 76,206.42 99.93

L1 instruction caches (Associativity+8) 8497655 1182 76,206.42 99.93

L1 instruction caches (Associativity+8) 8497655 1182 76,206.42 99.93

L1 instruction caches (Associativity+3) 8497655 1182 76,206.42 99.93

L1 instruction caches (Associativity+3) 8497655 1182 76,206.42 99.93

L1 data caches (Associativity+2) 8497655 1182 76,206.42 99.93

L1 data caches (Associativity+2) 8497655 1182 76,206.42 99.93

L1 data caches (Associativity+3) 849765 1182 76,206.42 99.93

L1 data caches (Associativity+3) 849765 1182 76,206.42 99.93
```



In this plot, we are fixing the size of icache and dcache to 256KB. Initially in the hit rate vs associativity, the dcache hit rate slightly increased and became stable after associativity reaches 2. Also, in the Average miss latency vs associativity, after associativity of 2, the lcache slightly increased while the dcache dropped and became stable. This makes sense that the dcache miss latency increased because the hit rate of dcache improved which led to lower miss latency. Surprisingly, the icache miss latency increases even though the hit rate remained mostly unchanged.



Similar to the separate cache scheme, the unified cache also increased and stabilize after associativity goes up to 2.

Part 4:

| Part 4:                                            |             |               |                       |             |               |                       |              |                |                        |                       |                       |                        |
|----------------------------------------------------|-------------|---------------|-----------------------|-------------|---------------|-----------------------|--------------|----------------|------------------------|-----------------------|-----------------------|------------------------|
|                                                    | Dcache_Hits | Dcache_Misses | Dcache_AvgMissLatency | Icache_Hits | Icache_Misses | Icache_AvgMissLatency | L2cache_Hits | L2cache_Misses | L2cache_AvgMissLatency | Dcache_Hit_Percentage | Icache_Hit_Percentage | L2cache_Hit_Percentage |
| Size=(l1i_size=64kB, l1d_size=128kB, l2_size=2MB)  |             |               | 79,097.59             |             |               | 76,286.43             |              |                | 169,391.58             |                       |                       |                        |
| Size=(l1i_size=64kB, l1d_size=128kB, l2_size=4MB)  |             |               | 79,097.59             |             |               | 76,286.43             |              |                | 169,391.58             |                       | 99.99                 |                        |
| Size=(l1i_size=64kB, l1d_size=256kB, l2_size=2MB)  |             |               | 79,131.79             |             |               | 76,286.43             |              |                | 169,391.58             |                       |                       |                        |
| Size=(l1i_size=64kB, l1d_size=256kB, l2_size=4MB)  |             |               |                       |             |               | 76,286.43             |              |                | 169,227.80             |                       |                       |                        |
| Size=(l1i_size=128kB, l1d_size=128kB, l2_size=2MB) |             |               | 79,097.59             |             |               |                       |              |                | 169,227.80             |                       |                       |                        |
| Size=(l1i_size=128kB, l1d_size=128kB, l2_size=4MB) |             |               | 79,097.59             |             |               | 76,286.43             |              |                | 169,227.80             |                       |                       |                        |
| Size=(l1i_size=128kB, l1d_size=256kB, l2_size=2MB) |             |               |                       |             |               |                       |              |                | 169,389.85             |                       | 99.99                 |                        |
| Size=(l1i_size=128kB, l1d_size=256kB, l2_size=4MB) |             |               |                       |             |               | 76,286.43             |              |                | 169,389.85             |                       |                       |                        |
| Size=(l1i_size=256kB, l1d_size=128kB, l2_size=2MB) |             |               | 79,097.59             |             |               | 76,286.43             |              |                | 169,389.85             |                       |                       |                        |
| Size=(l1i_size=256kB, l1d_size=128kB, l2_size=4MB) | 3312982     |               | 79,097.59             |             |               | 76,286.43             |              |                | 169,488.75             |                       |                       |                        |
| Size=(l1i_size=256kB, l1d_size=256kB, l2_size=2MB) |             |               |                       |             |               | 76,286.43             |              |                | 169,488.75             |                       |                       |                        |
| Size=(l1i_size=256kB, l1d_size=256kB, l2_size=4MB) |             |               | 79,131.79             |             |               |                       |              |                | 169,488.75             |                       |                       |                        |
| Size=(l1i_size=64kB, l1d_size=64kB, l2_size=2MB)   |             |               | 79,097.59             |             |               | 76,286.43             |              |                | 169,313.44             |                       |                       |                        |
| Size=(l1i_size=64kB, l1d_size=64kB, l2_size=4MB)   |             |               | 79,097.59             |             |               | 76,286.43             |              |                | 169,313.44             |                       |                       |                        |
| Size=(l1i_size=64kB, l1d_size=32kB, l2_size=2MB)   |             |               | 79,131.79             |             |               | 76,286.43             |              |                | 169,313.44             |                       |                       |                        |
| Size=(l1i_size=64kB, l1d_size=32kB, l2_size=4MB)   |             |               |                       |             |               | 76,286.43             |              |                | 169,628.60             |                       |                       | 0.83                   |
| Size=(l1i_size=32kB, l1d_size=64kB, l2_size=2MB)   |             |               | 79,097.59             |             |               |                       |              |                | 169,628.60             |                       | 99.99                 | 0.83                   |
| Size=(l1i_size=32kB, l1d_size=64kB, l2_size=4MB)   |             |               | 79,097.59             |             |               | 76,286.43             |              |                | 169,628.60             |                       |                       | 0.83                   |
| Size=(l1i_size=32kB, l1d_size=32kB, l2_size=2MB)   |             |               |                       |             |               |                       |              |                | 169,666.67             |                       |                       |                        |
| Size=(l1i_size=32kB, l1d_size=32kB, l2_size=4MB)   |             |               |                       |             |               | 76,286.43             |              |                | 169,666.67             |                       |                       |                        |
| Size=(l1i_size=128kB, l1d_size=64kB, l2_size=2MB)  |             |               | 79,097.59             |             |               | 76,286.43             |              |                | 169,666.67             |                       |                       |                        |
| Size=(l1i_size=128kB, l1d_size=64kB, l2_size=4MB)  |             |               | 79,897.59             |             |               | 76,286.43             |              |                |                        |                       | 99.99                 |                        |
| Size=(l1i_size=128kB, l1d_size=32kB, l2_size=2MB)  |             |               | 79,131.79             |             |               | 76,286.43             |              |                |                        |                       |                       |                        |
| Size=(l1i_size=128kB, l1d_size=32kB, l2_size=4MB)  |             |               | 79,131.79             |             |               | 76,286.43             |              |                |                        |                       | 99.99                 |                        |
| Size=(l1i_size=256kB, l1d_size=64kB, l2_size=2MB)  |             |               | 79,897.59             |             |               | 76,286.43             |              |                | 169,158.59             |                       |                       | 0.89                   |
| Size=(l1i_size=256kB, l1d_size=64kB, l2_size=4MB)  | 3312982     |               | 79,897.59             | 8497655     |               | 76,286.43             |              |                | 169,158.59             |                       |                       | 0.09                   |
| Size=(l1i_size=256kB, l1d_size=32kB, l2_size=2MB)  | 3312982     |               | 79,131.79             |             |               | 76,286.43             |              |                | 169,158.59             |                       |                       | 8.89                   |
| Size=(l1i_size=256kB, l1d_size=32kB, l2_size=4MB)  |             |               |                       |             |               | 76,286.43             |              |                | 169,158.59             |                       |                       | 0.09                   |
|                                                    |             |               |                       | ·           |               |                       |              |                |                        |                       |                       |                        |





Regardless of which cache size we varied in the simulation of part 4, the hit rate and the average miss latency remains unchanged. The miss rate of L2 is significantly lower than that of the icache and dcache. The reason is because matmul operation (matrix

multiplication) only performs instructions on the same set of input repeatedly. So, after the first few cache miss, then the CPU doesn't need to go to L2 to fetch the instruction and data anymore. It will continuously operate on the instruction in the L1 because it has the same input which leads to very high hit rate in L1. L2 then suffer from high miss rate since only the first few requests which are misses will get down to L2 and afterwards barely any request will make it to L2.

### Note:

In our zip file, we provided the following codes:

- 1. Code to generate the table-----> gem5\_result\_table.py
- 2. Code to generate the plots -----> generate\_plots.py
- 3. Gem5 python config script ----- simple\_cache.py and cache\_working.py
- 4. Bash script to run gem5 sim-----> cache\_sim.sh

## Instruction to run the code:

- 1. Build gem5.opt either on docker or on virtual machine.
- 2. Extract the zip to proj2 folder.
- 3. Relocate proj2 folder to the following path /gem5/src/learning-gem5/
- 4. Run cache\_sim.sh to generate the four output files for each part (1-4) called filtered\_output\_p(1-4).txt. The output files are already provided when doing a matmul operation with the imput r1=r2=c1=c2=50.
- 5. Run gem5\_result\_table.py to generate table.
- 6. Run generate\_plots.py to generate plots.